This page provides a compact, data-driven overview of Kampala and its surrounding region to support CLARS training and discussion. We compile several remotely sensed and derived spatial datasets, visualize key indicators over time, and use exploratory modeling and mapping to examine neighborhood-scale patterns relevant to climate risk, urban development, and informal settlement dynamics. The goal is to support scenario-based discussion rather than produce definitive predictions.
The Groundswell Africa: A Deep Dive on Internal Climate Migration in Uganda report projects that climate-related in- and out-migration hotspots in Uganda will begin to emerge as early as 2030, intensifying and spreading geographically in the decades that follow (see figure below). These hotspots reflect areas where population movements are consistently projected across multiple modeled scenarios.
The driving forces behind these shifts include declining ecosystem viability—particularly due to water stress, reduced crop yields, and broader landscape degradation. Given the region’s rapid population growth, most areas will not lose population outright. Instead, climate impacts will dampen growth in affected regions, especially where agricultural productivity and water availability are most strained.
The Key Question:
Where might new settlements emerge under current planning frameworks if this area were to receive climate migrants? If approximately 300,000 people were to arrive over the next 30 years, where could they be accommodated, and which areas should be avoided due to environmental or infrastructural risk?
This website presents data and analyses for Kampala and the surrounding region to explore how the city might change if it were to become a destination for climate migration. We focus on current development trends—where growth is occurring, whether it reflects outward expansion or infill, and how environmental and infrastructural conditions may shape future development pathways. The analyses are designed to support learning, discussion, and planning conversations rather than to generate precise forecasts.
We have aimed to make both the data and the analyses easy to read and explore, while also providing the underlying code used to generate the visualizations and results. You can click the “Show Code” button throughout the site to reveal the code sections associated with each analysis.
We begin by loading the required R libraries, which provide the functions needed for specific tasks such as data processing, spatial analysis, and mapping. If you are new to R or unfamiliar with libraries, the following resources provide helpful introductions:
R overview: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
Climate-related migration does not occur in isolation from existing urban systems. Where new residents settle is shaped by a combination of environmental constraints, infrastructure access, land availability, and prior development patterns. In rapidly growing cities like Kampala, these factors interact with informal housing dynamics and uneven planning capacity, often amplifying exposure to flood risk and environmental stress.
In this analysis, we treat past development patterns as an empirical signal of how these forces have interacted historically. By examining where development has occurred—and under what conditions—we can explore how similar pressures might shape future settlement patterns if Kampala experiences sustained in-migration related to climate stress elsewhere in Uganda.
We assembled a comprehensive set of spatial datasets for Kampala and the surrounding region to characterize population distribution, urban form, infrastructure, and environmental conditions. These data include high-resolution population density estimates, building footprint datasets, and mapped informal housing. To capture accessibility and environmental constraints, we incorporate road and waterway networks and derived distance surfaces (e.g., Euclidean distance to the CBD). Land-use and surface characteristics are represented through residential layers at 100 m resolution (2005–2025, 5-year intervals) and impervious surface estimates derived from remote sensing. Topographic and hydrological conditions are captured using DEMs and flow-direction products.
| Data type | Year / Period | Source |
|---|---|---|
| Population | 2020 | HDX – Population Density |
| Building footprints | — | gloBFPr (GitHub) |
| Informal housing | 2019 | UChicago Box – Informal Housing |
| Water bodies (cost distance to roads) | — | HOTOSM – Waterways |
| Roads (cost distance to roads) | — | HOTOSM – Roads |
| Euclidean distance to CBD | — | Derived |
| Residential land use (100 m) | 2005–2025 (5-year intervals) | https://human-settlement.emergency.copernicus.eu/download.php |
| Impervious surface | 2005–2025 (annual) | Google Earth Engine |
| Central Business District (CBD) | — | Local definition |
| Digital Elevation Model (DEM) | 2017 | USGS ScienceBase – DEM |
| Flow direction | 2017 |
What do these data look like? What are the spatial patterns?
To begin, let’s examine an animation of Kampala and the surrounding area showing change in residental over the past 25 years at five-year intervals. This comes n the residential raster data, each pixel represents a 100m x 100m area (10,000 m²), and its value (ranging from 0 to 10,000) indicates the number of square meters covered by residential built-up surface within that cell. For example, a value of 5,000 means half the cell is built-up for residential use. The values are absolute (not percentages), and 65535 denotes NoData https://human-settlement.emergency.copernicus.eu/download.php
library(terra)
# Example: five SpatRaster objects
r1 <- rast("data/residential_2005.tif")
r2 <- rast("data/residential_2010.tif")
r3 <- rast("data/residential_2015.tif")
r4 <- rast("data/residential_2020.tif")
r5 <- rast("data/residential_2020.tif")
# Combine into a multi-layer SpatRaster
r <- c(r1, r2, r3, r4, r5)
# Check
#r
#nlyr(r)
# library(gifski)
#
# names(r) <- c("2005","2010","2015","2020","2025")
#
# # ---- FIXED GLOBAL RANGE (KEY PART) ----
# # Compute once across ALL layers
# global_min <- min(values(r), na.rm = TRUE)
# global_max <- max(values(r), na.rm = TRUE)
# fixed_range <- c(global_min, global_max)
#
# # ---- RENDER FRAMES ----
# tmpdir <- file.path(tempdir(), "frames_fixed_range")
# dir.create(tmpdir, showWarnings = FALSE)
# png_files <- file.path(tmpdir, sprintf("frame_%03d.png", 1:nlyr(r)))
#
# for (i in 1:nlyr(r)) {
# png(png_files[i], width = 900, height = 700, res = 120)
# plot(
# r[[i]],
# main = names(r)[i],
# range = fixed_range, # <-- legend is now identical for all frames
# axes = FALSE
# )
# dev.off()
# }
#
# # ---- CREATE GIF ----
# gif_file <- "raster_animation_fixed_legend.gif"
# gifski(
# png_files,
# gif_file,
# width = 900,
# height = 700,
# delay = 0.25
# )
#
# gif_file
These patterns reflect built-up residential surface rather than specific building types, so changes may include housing, mixed-use development, or other residential-related structures.
knitr::include_graphics("raster_animation_fixed_legend.gif")

Key question:
As you view the animation, where do you observe the most pronounced changes, and do these patterns align with your experience or understanding of how the city has developed? Are these residential houses? commerical buildings? roads?
The variables below are not exhaustive, but they represent commonly used indicators in urban geography and land-change research that capture physical constraints, accessibility, and existing development intensity. Environmental conditions and existing infrastructure often play a key role in shaping where urban development occurs. To explore these influences, we assembled several datasets that capture environmental constraints and infrastructural context. Below, we highlight a selection of datasets that are particularly relevant from urban geography perspective.
Elevation may help explain development patterns by influencing construction costs on uneven terrain, shaping efforts to avoid flood-prone areas, or attracting development to locations that offer favorable views and environmental conditions.
library(terra)
library(stars)
library(ggplot2)
# Load multi-layer raster
r6 <- rast("data/Kampala_DEM.tif")
r7 <- rast("data/kampala_building_presence.tif")
r8 <- rast("data/kampala_building_height.tif")
r9 <- rast("data/Kampala_flowdirection.tif")
names(r6) <- "Elevation"
names(r7) <- "Buildings"
names(r8) <- "Building Height"
names(r9) <- "Water flow"
plot(r6)

As noted above, local hydrology can strongly influence development patterns. Water flow may act as an impediment to development due to wet or unstable conditions, pose public health risks where standing water supports mosquito breeding, or, conversely, attract settlement by increasing access to water for residents. This is calculated through a combination of Flow Direction and Flow Accumulation on a Digital Elevation Model (DEM) to determine how water moves across a landscape
plot(r9)

We are also using unique building footprint data to help understand urban densities,for the city
plot(r7)
### Building footprint The height of buildings might also be valuable
for us to understand population density of the city. Here we take the
average building height.
plot(r8)

The visualizations above help us build intuition about spatial patterns of growth, risk, and infrastructure across Kampala. However, maps alone do not tell us which factors are most strongly associated with development, nor how these factors interact.
To move beyond description, we shift to an exploratory modeling approach. This allows us to formally examine relationships between development and multiple spatial indicators simultaneously, while still maintaining transparency and interpretability. Importantly, this step is intended to support learning and discussion—not to generate definitive predictions.
To analyze development systematically, we translate heterogeneous spatial datasets into a common analytical framework. This allows us to compare environmental, infrastructural, and demographic conditions across space and relate them to observed development outcomes.
To move from visualization to modeling, we need to define “development.” This is challenging because development can take many forms—greenfield expansion, infill, redevelopment, or densification. Here, we adopt a simplified, binary definition to support exploratory analysis.
To analyze where development is happening in Kamapal, we simplified the data by creating a regular spatial grid. This approach allowed us to integrate datasets with different spatial resolutions and data types within a common analytical framework. We used zonal statistics to summarize raster-based variables and aggregated vector data on housing and infrastructure within each grid cell, compiling the results into a single spatial dataset.
Below, you can explore several key variables, including population, flood risk, estimates of informal housing, and changes in impervious surface over time. Additional datasets were also compiled and are explored in subsequent sections.
g <- sf::st_read("data/hexagon_with_impervious.gpkg")
## Reading layer `hexagon_with_impervious' from data source
## `C:\Users\dbvanber\Downloads\climate-migration-training-rmd-site\data\hexagon_with_impervious.gpkg'
## using driver `GPKG'
## Simple feature collection with 4464 features and 43 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 415514.4 ymin: -868.2945 xmax: 478014.4 ymax: 61774.22
## Projected CRS: WGS 84 / UTM zone 36N
g <- sf::st_transform(g, crs = 4326)
vars <- c("population", "flood_area", "informal_level_avg",
"avg_people_per_building", "delta_imperv_05_25")
m <- leaflet(g) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addScaleBar(position = "bottomleft")
for (v in vars) {
stopifnot(v %in% names(g))
vals <- g[[v]]
pal_name <- c(
population = "viridis",
flood_area = "magma",
informal_level_avg = "plasma",
avg_people_per_building = "cividis",
delta_imperv_05_25 = "inferno"
)[[v]]
pal <- colorNumeric(pal_name, domain = vals, na.color = "transparent")
m <- m %>%
addPolygons(
fillColor = pal(vals),
fillOpacity = 0.6,
color = NA,
weight = 1,
opacity = 0.4,
group = v,
popup = paste0("<b>", v, ":</b> ", signif(vals, 4))
) }
m <- m %>%
addLayersControl(
baseGroups = vars,
options = layersControlOptions(collapsed = FALSE)
) %>%
hideGroup(vars[-1]) # keep layers mutually exclusive, show first by default
m
To explore potential patterns of future development, we use the compiled spatial indicators to examine how environmental conditions, infrastructure, and existing urban form are associated with where development has occurred in the past. By identifying relationships between these factors and observed development patterns, the analysis helps highlight areas that may be more or less suitable for future growth. This provides an initial, exploratory framework for considering how Kampala might accommodate additional population while minimizing environmental risk and infrastructure strain.
Together, these variables allow us to examine which conditions are associated with past development patterns, offering insight into where development pressure may be higher or lower under similar conditions in the future.
Before modeling development, we must first define what we mean by “development.” This is inherently challenging because development occurs in many different forms across a city. It may include large-scale expansion onto previously agricultural, forested, or natural land; redevelopment of already urbanized areas; incremental infill within existing neighborhoods; or, in some cases, the loss or replacement of existing urban structures. Capturing all of these processes within a single definition is difficult, particularly for exploratory analysis and teaching purposes.
To simplify the analysis while retaining interpretability, we adopt a binary definition of development. For each spatial unit, an area is classified as developed if it experienced a measurable increase in built-up or impervious surface over the study period, and not developed otherwise. This simplified definition allows us to focus on identifying broad associations between development and environmental, infrastructural, and demographic conditions, rather than attempting to model the full complexity of urban change.
While this approach does not distinguish between different types of development (e.g., infill versus greenfield expansion), it provides a transparent and reproducible starting point for exploring how past development patterns relate to underlying spatial conditions.
# Binary "Developed"
g <- g %>%
mutate(
is_developed = ifelse(hex_class == "Developed", 1, 0),
log_Area = log(Area + 1),
log_flood = log(flood_area + 1),
pop_growth = log(residential_2020 + 1) - log(residential_2005 + 1)
)
##process some of the data
if ("flow_direction" %in% colnames(g)) {
g$flow_direction <- factor(g$flow_direction)
}
# Scale
num_vars <- c(
"log_Area", "euclidean_CBD", "dist_road_30m",
"DEM", "slope", "avg_people_per_building", "log_flood", "pop_growth"
)
num_data <- g %>%
st_drop_geometry() %>%
select(all_of(num_vars))
num_data <- num_data[, sapply(num_data, is.numeric)]
num_data <- num_data[, colSums(!is.na(num_data)) > 0]
num_data_scaled <- scale(num_data)
g[, colnames(num_data_scaled)] <- num_data_scaled
Now that we’ve processed the data, defined development (albeit in a
simplified way), and assembled variables that may help explain
development over the past 20 years, we’re ready to estimate our model.
We’ll use observations of development and non-development, along with
their corresponding environmental features, to model which variables
explain the probability that development occurs. Because our outcome is
binary (presence vs. absence of development), we will use a binomial
logistic regression model (you can read more about logistic regression
here: - UCLA IDRE: Negative
Binomial Regression in R - R Documentation: glm()
– Generalized Linear Models
To evaluate how well the model generalizes, we’ll split the data into training and testing sets and assess performance on locations the model has not seen. This is standard best practice for predictive modeling.
# vars for modelization
model_vars <- c(
"is_developed", "log_Area", "pop_growth", "euclidean_CBD", "dist_road_30m",
"slope", "avg_people_per_building", "DEM", "log_flood"
)
if ("flow_direction" %in% colnames(g)) {
model_vars <- c(model_vars, "flow_direction")
}
# No NA
hex_model <- g %>%
filter(across(all_of(model_vars), ~ !is.na(.)))
set.seed(123)
train_index <- createDataPartition(hex_model$is_developed, p = 0.7, list = FALSE)
train_data <- hex_model[train_index, ]
test_data <- hex_model[-train_index, ]
# factor in train/test if any
if ("flow_direction" %in% colnames(g)) {
train_data$flow_direction <- factor(train_data$flow_direction)
test_data$flow_direction <- factor(test_data$flow_direction)
}
The goal of this model is not to predict the exact locations of future development, but to examine which factors are associated with development under past conditions. The resulting probabilities should be interpreted as relative likelihoods—useful for comparison and scenario-based discussion—rather than as site-specific forecasts.
The independent variables were selected to represent key demographic, spatial, and environmental factors commonly associated with urban development. Total developable area captures the physical capacity for new development, while past population growth reflects demand-side pressure related to in-migration and household expansion. Distance to the central business district represents accessibility to employment, services, and amenities, and distance to the nearest road captures transportation access and potential development costs. Land slope and digital elevation represent topographic constraints that can affect construction feasibility, while flood-prone area and flow direction characterize exposure to hydrological risk that may discourage or redirect development. Finally, the average number of people per building serves as a proxy for existing development intensity and housing pressure, which can influence the likelihood of additional development in nearby areas.
logit_model_r <- glm(
is_developed ~
log_Area +
pop_growth +
poly(euclidean_CBD, 2) +
dist_road_30m +
slope +
avg_people_per_building +
DEM +
log_flood +
flow_direction,
data = train_data,
family = binomial
)
summary(logit_model_r)
##
## Call:
## glm(formula = is_developed ~ log_Area + pop_growth + poly(euclidean_CBD,
## 2) + dist_road_30m + slope + avg_people_per_building + DEM +
## log_flood + flow_direction, family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.74039 0.48668 -11.795 < 2e-16 ***
## log_Area 2.75297 0.24280 11.338 < 2e-16 ***
## pop_growth -0.02429 0.19249 -0.126 0.8996
## poly(euclidean_CBD, 2)1 -104.03169 16.25062 -6.402 1.54e-10 ***
## poly(euclidean_CBD, 2)2 -64.81810 15.31816 -4.231 2.32e-05 ***
## dist_road_30m -4.62718 3.19968 -1.446 0.1481
## slope 0.10365 0.19972 0.519 0.6038
## avg_people_per_building -1.30692 0.21679 -6.028 1.66e-09 ***
## DEM -0.04835 0.17958 -0.269 0.7878
## log_flood -0.22270 0.10006 -2.226 0.0260 *
## flow_direction2 0.77655 0.41354 1.878 0.0604 .
## flow_direction4 0.45186 0.30651 1.474 0.1404
## flow_direction8 0.59941 0.34460 1.739 0.0820 .
## flow_direction16 -0.11026 0.27887 -0.395 0.6926
## flow_direction32 -0.41704 0.65981 -0.632 0.5273
## flow_direction64 0.29780 0.32953 0.904 0.3661
## flow_direction128 0.52511 0.32227 1.629 0.1032
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1359.72 on 2356 degrees of freedom
## Residual deviance: 827.79 on 2340 degrees of freedom
## AIC: 861.79
##
## Number of Fisher Scoring iterations: 11
The model output provides estimates of how each independent variable is associated with the probability that development occurs, holding all other variables constant.
Coefficients (Estimate):
Each coefficient represents the direction and strength of the
relationship between a variable and the likelihood of development.
Standard Error:
The standard error reflects the uncertainty around each coefficient
estimate. Larger standard errors indicate less precise
estimates.
z-value and p-value:
These values indicate whether the relationship between a variable and
development is statistically distinguishable from zero. Smaller p-values
suggest stronger evidence that a variable is associated with
development.
Intercept:
The intercept represents the baseline log-odds of development when all
independent variables are zero. This value is often difficult to
interpret directly, especially when zero is not a meaningful value for
the predictors.
Predicted probabilities:
Because this is a logistic regression model, coefficients are estimated
in terms of log-odds. These can be transformed into
predicted probabilities, which are often easier to
interpret and visualize, particularly in a spatial context.
Key Question: Interpreting the Model
What do the coefficients tell us about the factors that may influence development in this region?
Which variables have positive coefficients and which have negative coefficients?
(Be careful in your interpretation: a negative coefficient means that as the variable increases, the probability of development decreases.)
What do the p-values tell us about the strength of evidence for each relationship?
Finally, what does it mean to interpret results in terms of log-odds, and why do we often convert them to predicted probabilities?
print(vif(logit_model_r))
## GVIF Df GVIF^(1/(2*Df))
## log_Area 2.698815 1 1.642807
## pop_growth 1.464855 1 1.210312
## poly(euclidean_CBD, 2) 2.726039 2 1.284941
## dist_road_30m 1.032850 1 1.016292
## slope 2.330648 1 1.526646
## avg_people_per_building 3.443195 1 1.855585
## DEM 2.504591 1 1.582590
## log_flood 1.570803 1 1.253317
## flow_direction 1.158424 7 1.010560
Before interpreting model results, it is important to check for
multicollinearity among the independent variables. Multicollinearity
occurs when predictors are highly correlated with one another, which can
inflate standard errors and make coefficient estimates unstable. We
typically assess this using correlation matrices or variance inflation
factors (VIFs) to ensure that no single variable is overly redundant
with others. You can read more about VIFs in R here:
https://cran.r-project.org/web/packages/car/vignettes/variance-inflation-factors.html
To assess how well the model generalizes to new, unseen data, we evaluate its performance on the test set. We use the fitted model to generate predicted probabilities of development for each observation, then compare these predictions to the observed outcomes. Model performance is summarized using the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC), where higher AUC values indicate better ability to distinguish between developed and non-developed locations.
test_data$pred_prob <- predict(logit_model_r, newdata = test_data, type = "response")
roc_obj <- roc(test_data$is_developed, test_data$pred_prob)
auc_val <- auc(roc_obj)
print(paste("AUC on test set:", round(auc_val, 3)))
## [1] "AUC on test set: 0.918"
The Area Under the Curve (AUC) summarizes how well the model distinguishes between developed and non-developed locations.
In applied planning contexts, AUC values above ~0.7 are often considered useful, especially when combined with interpretability and domain knowledge.
The ROC curve visualizes the trade-off between true positive rate (sensitivity) and false positive rate (1 − specificity) across different probability thresholds. A curve that bows strongly toward the upper-left corner indicates better model performance.
# Plot ROC curve
plot(roc_obj, col = "blue", lwd = 2, main = "ROC Curve for Development Model")
abline(a = 0, b = 1, lty = 2, col = "gray")

The mapped probabilities of development represent how similar each location is to areas that have developed in the past, given the variables included in the model. Areas with higher predicted probabilities share characteristics—such as accessibility, prior growth, or lower environmental constraint—that historically coincide with development.
These maps should be read comparatively rather than absolutely. A “high potential” area is not guaranteed to develop, nor is a “low potential” area unsuitable for growth. Instead, the maps help identify where development pressure may concentrate if existing patterns and constraints persist.
The predicted probabilities represent the model’s estimate of how likely each hexagon is to experience development, given its demographic, spatial, and environmental characteristics. Values closer to 1 indicate a higher likelihood of development, while values closer to 0 indicate a lower likelihood. Mapping these probabilities allows us to visualize spatial patterns of development potential and identify areas where development pressure may be higher or lower under current conditions.
hex_model$pred_prob <- predict(logit_model_r, newdata = hex_model, type = "response")
ggplot(hex_model) +
geom_sf(aes(fill = pred_prob), color = NA) +
labs(
title = "Predicted Probability of Development",
fill = "Probability"
) +
theme_minimal()

hex_model$future_develop_class <- cut(
hex_model$pred_prob,
breaks = c(0, 0.33, 0.66, 1),
labels = c("Low potential", "Medium potential", "High potential"),
include.lowest = TRUE
)
ggplot(hex_model) +
geom_sf(aes(fill = future_develop_class), color = NA) +
labs(
title = "Development Potential (Classified)",
fill = "Class"
) +
theme_minimal()

library(sf)
library(tmap)
# Interactive mode
tmap_mode("view")
# Reproject to lat/long for web basemaps (if needed)
hex_web <- st_transform(hex_model, 4326)
# (A) Continuous probability map (interactive + basemap)
tm_shape(hex_web) +
tm_basemap("OpenStreetMap") +
tm_polygons(
col = "pred_prob",
title = "Probability",
id = "pred_prob",
border.col = "gray80",
alpha = 0.8
) +
tm_view(set.view = c(lon = 32.5825, lat = 0.3476, zoom = 10)) + # adjust zoom (e.g., 10–14)
tm_layout(title = "Predicted Probability of Development")
tm_shape(hex_web) +
tm_basemap("OpenStreetMap") +
tm_polygons(
col = "future_develop_class",
title = "Class",
id = "future_develop_class",
alpha = 0.8,
border.col = "gray80"
) +
tm_view(set.view = c(lon = 32.5825, lat = 0.3476, zoom = 10)) + # adjust zoom (e.g., 10–14)
tm_layout(title = "Development Potential (Classified) – Kampala")
This analysis makes several simplifying assumptions that are important to acknowledge. Development is represented using a binary definition based on changes in built-up surface, which does not distinguish between infill, redevelopment, or informal expansion. Social, political, and institutional factors—such as land tenure, zoning, and governance—are not explicitly modeled, despite their importance in shaping urban outcomes.
Future extensions of this work could:
Incorporate scenario-based changes in infrastructure or flood risk
Distinguish between formal and informal development processes
Engage local planners and communities to ground-truth assumptions
Use the model outputs as inputs to participatory planning exercises
In the CLARS context, these results are best used as a starting point for discussion: Where might development pressure emerge, which areas appear most constrained by risk, and how could planning interventions shift these trajectories?